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IMPROVED AUDIO PROCESSING SYSTEM AND RECORDINGS 

MADE THEREBY 

Background of the Invention 

The present invention relates to acoustical processing systems and more 
particularly to processing systems in which the acoustical output from a single 
sound source is processed to produce a plurality of channels. 

Consider a sound processing system which takes a single channel sound 
source and produces two channels therefrom. For example, the processing 
system could be a stereophonic system in which the sound source is sensed by 
two microphones whose outputs are then processed to produce left and right 
channels for eventual playback on left and right speakers or headphones. Alter- 
natively, the output of a single microphone could be electronically processed to 
produce the left and right channels. Such a system is described in U.S. Patent 
3,670,106. 

In the case of stereophonic systems, the goal of the processing system is to 
create the illusion of a sound source of a predetermined size located at a specific 
position relative to the speakers. The perceived locations of the various sound 
sources generated by the stereophonic signals create for the listener what is 
known as an acoustic image, i.e., a map of the imaginary physical locations of 
these sound sources. The apparent location of the sound source is largely 
determined by the difference in arrival time and the intensity of the relevant 
component signals generated in the left and right speakers. 

In prior art stereophonic sound systems, the illusion of a sound source of 
any specific size is difficult to generate in such systems. Some prior art systems 
utilize reverberation to broaden the sound image. Others utilize 180 degree 
phase shifts. 

Shimada (U.S. Patent :3,8y:z,b24j ana jjoi, ei ai. ^u.a. raicm '♦.uu^.j^t; 
describe a stereophonic reproduction system in which portions of the input 
signals are scaled by a constant, k, and cross-fed in 180-degree out-of-phase 
relationships. That is, given left and right input signals a,(t) and a/t), left and 
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right output signals L=a^(t)-ka^(t) and R=a^(t)-ka^(t) are generated. When L 
and R are presented over two loudspeakers, a listener located between the 
loudspeakers perceives a broadened sound image. 

These types of systems are problematic in that they often alter the timbral 
quality of the program material. The summation of the signals used to provide 
the output signals results in constructive and destructive interference. This 
interference alters the perceived timbre of the sound. In addition, the acoustical 
images created often appear broken, and the effects are highly dependent on the 
listener's location relative to the loudspeakers. The magnitude of these prob- 
lems depends critically upon the program material; hence, it is impossible to 
compensate for the distortions through further processing of the resulting sig- 
nals. As a result, listeners at different locations hear quite different effects in 
timbre, image width, and image location. 


In addition, these systems suffer from two other problems. First, the 
apparent distance of the sound source is limited to locations on a line between 
the speakers. For example, the illusion of a sound source located between the 
speakers and the listener can not be produced without utilizing additional speak- 
ers closer to the listener. 

Second, the perceived location of the sound source depends critically on 
the location of the listener relative to the speakers. Thus, if a particular signal 
component is fed to both speakers with no relative delay and the same signal 
amplitude, the component of the acoustic image created by that signal will 
appear to be located on a line centered between the two speakers. If that signal 
component arrives fractionally earlier from the left speaker than from the right 
and/or the intensity of the component from the left speaker is greater than that 
from the right speaker, its image component will appear to be located left of 
center. The apparent locations of a set of such image components makes up the 
composite acoustic image perceived by the listener. 

In typical listening environments such as living rooms or theaters, most 
listeners are located nearer to one loudspeaker than to the other(s). For the 
purposes of the following discussion, it will be assumed that the acoustic image 
is being produced by only two loudspeakers. If the listener moves nearer to one 
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loudspeaker, the sound from that speaker is more intense and reaches the listen- 
er ahead of the sound generated at the same time in the other speaker. Hence, 
moving the listener closer to one speaker is equivalent to introducing an intensi- 
ty loss and time delay into the material being reproduced in the other speaker. 
When veiy similar material is reproduced by two or more loudspeakers, listen- 
ers report that the sound images they perceive are either shifted toward the loca- 
tion of the nearest loudspeaker or almost entirely located in the nearest loud- 
speaktT, depending upon the delay in question. 

It should be noted that when a listener moves nearer to one speaker, both 
the intensity of the sound and the time delay are affected. It has been shown 
that the arrival time difference has a more pronounced and important influence 
than does the intensity difference. 

If the time delay is less than approximately 1.0 msec., listeners describe 
hearing a single sound image located between the speakers, but shifted toward 
the closer speaker. This effect is referred to as image shift. If the time delay is 
greater than approximately 1.0 msec but less than an upper limit discussed 
below, the Ustener perceives a single sound image that is located at the closer 
loudspeaker. The traditional explanation for this phenomenon is that the listen- 
er's auditory system has attempted to suppress the delayed signal. This phe- 
nomenon is often referred to as the precedence effect, the Haas effect, or the 
law of the first wavefront. In the following discussion, the effect will be re- 
ferred to as the precedence effect. 

There is an upper limit to the time delay at which the precedence effect 
operates. At time delays greater than this limit, the delayed sound is heard. 
The exact magnitude of this upper limit depends upon the qualities of the sound 
source. The precedence effect is more pronounced for transient sound sources 
such as struck or plucked musical instruments than it is for continuous sound 
sources such as blown or bowed musical instruments. The upper limit is found 
experimentally to vary from 8 to 70 msec with a typical limit being about 15 
msec. 


When the precedence effect releases, listeners report that the sound image 
is located in two loudspeakers. When the loudspeakers are separated by a suffi- 
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ciently great distance, listeners report hearing two sound images, one of which 
being echo-like. As the time delay from the difference in distances to the two 
loudspeakers increases further, the intensity difference also increases significant- 
ly. When the intensity difference is approximately 15 dB, the more distant 
loudspeaker becomes difficult to hear. At this point, listeners rqx)rt that the 
sound image is located in one loudspeaker. 

A further example of a processing system in which a single sound source 
is processed for reproduction through a number of loudspeakers is a public 
address system. In such systems, a monophonic signal is reproduced through a 
plurality of loudspeakers to provide a sound field which covers a large area. 
These systems suffer from problems of a different type. In those areas in which 
the acoustical signals produced by different loudspeakers overlap, constructive 
and destructive interference occurs. The particular frequencies at which these 
different interference patterns occur is determined by the distance from each of 
the speakers to the location of the listener. Hence, the sound field at every 
point in the room will appear to be filt^ed by a set of frequency filters whose 
pass-band frequencies depend on the location relative to the speakers. This is 
equivalent to timbral shifting the original material. Such added coloration is 
undesirable, since it reduces the intelligibility of the material being broadcast as 
well as altering the fidelity of the reproduction. 

This problem is not limited to monophonic public address systems. Ster- 
eophonic systems designed to fill large halls with acoustical sound fields often 
suffer from this effect. 

Broadly, it is an object of the present invention to provide an improved 
audio processing and reproduction system. 

It is yet another object of the present invention to provide a stereophonic 
system in which the acoustic images are less dependent on the location of the 
listener relative to the speakers than are the images produced by prior art sys- 
tems. 


It is a further object of the present invention to provide a stereophonic 
system which provides the illusion of a sound source located between the speak- 
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ers and the listener. 

It is yet another object of the present invention to provide a sound repro- 
duction system for filling a large room with a sound field such that listeners in 
different parts of the room pea-ceive the same sound field. 

It is a still further object of the present invention to provide a sound 
reproduction system which allows the user to control the apparent width and 
distance of the sound source without adding reverberation or timbre changes. 

It is yet another object of the present invention to provide a sound repro- 
duction system which allows the user to control the apparent width and distance 
of the sound source with a minimum of two loudspeakers. 

It is a still further object of the present invention to provide a sound proc- 
essing system which allows the apparent width and location of the acoustical 
image generated thereby to be carried without introducing timbral shifts or 
causing the image to appear broken. 

These and other objects of the present invention will become apparent to 
those skilled in the art from the following detailed description of the invention 
and the accompanying drawings. 

Brief Description of the Drawings 

Figure 1 is a block diagram of an audio processing system according to 
the present invention. 

Figure 2 is a block diagram of one embodiment of a phase processor 
according to the present invention. 

Siimmarv of the Invention 

The present invention comprises ah apparatus for audio processing, a 
method of audio procession and a recording made by said method. An audio 
processing system according to the present invention generates a plurality of 
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output signals from a sound source input signal. The system comprises circuitry 
for receiving the sound input signal and for generating a plurality of channel 
5 signals therefrom. One of said channel signals comprises a signal which is 

substantially equal to the sum of M band-limited signals, the ith said band-limit- 
ed signal having an amplitude substantially equal to that of said input signal in a 
predetermined frequency range f ± 51 and a phase which differs from the 
phase of said input signal in said predetermined frequency range by an amount 
^j, i running from 1 to M, wherein M>2 and ^. is chosen between P-6P and 
P+5P, wherein <^>. is a rapidly varying function of i. 

Detailed Description of the Invention 

3^5 For the purpose of the following discussion, it will be assumed that the 

present invention operates on a single input signal to produce two output signals. 
The output signals may be channels or may be combined with other material to 
produce the final channels. The manner in which the present invention would 
operate to produce more than two output signals will be explained in more detail 

20 

The present invention provides its beneficial effects by altering the cross- 
correlation of the output signals while minimizing any timbral shifts between the 
input signal and the output signals. 

The cross-correlation of two signals, yj(t) and y^Ct), is typically measured 
in terms of a cross-correlation measure which is defined to be the extreme value 
of the cross-correlation function Q(x), where 

n(x)=Um 1/(2T) [ y,(t)y,(t+x) dt (1) 

30 T->oo ^ 

The cross-correlation measure has a maximum possible value of 1 and a mini- 
mum possible value of -1. As will be made clear in the following discussion, it 
is also important to consider simultaneously both the positive and the negative 
35 peaks of the cross-correlation function. 


25 


The manner in which the apparatus of the present invention operates may 
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be most easily understood with reference to Figures 1 and 2. Figure 1 is block 
diagram of an audio processing system 100 according to the present invention. 
Audio processing system receives an input signal from a sound source 100 and 
produces channels on a plurality of output channels of which output channels 
131-134 are exemplary. The input signal from sound source 101 may be in the 
form of an electric signal or sound waves. 

The signal input to audio processor 100 is processed by pre-processor 102 
to form a plurality of signals which, after further processing, are incorporated 
into the signals on the output channels. For the purpose of this discussion, the 
number of output channels will be denoted by N If the input signal is in the 
form of sound waves, pre-processor 102 includes one or more microphones to 
convert the sound waves to electrical signals. Pre-processor 102 may be as 
simple as an electrical junction for dividing the input signal into N^„, signals. 

N -1 of the signals generated by pre-processor 102 are input to phase- 

oul " " . . 

processors of which phase-processors 104-106 are exemplary. The remaining 
signal may either be input to a phase-processor or to a delay circuit 107. 

The manner in which the phase-processing circuit operates may be most 
easily understood with reference to Figure 2 which is a block diagram of a 
phase-processor 200 according to the present invention. Phase-processor 200 
converts an input signal x(t) to an phase processed output signal y(t) by altering 
the phase of various frequency components of x(t) while leaving the amplitude 
of the signal in the various components substantially unchanged. 

The output signal is generated by dividing the input signal into M compo- 
nents, each component matching the intensity of the signal in a specific frequen- 
cy band. Apparatus 200 utiUzes a plurality of band-pass filters 12 for this pur- 
pose. The signal in the ith frequency band is then phase-shifted by an amount <t>^ 
utilizing a phase shifting network 14. As will be explained in more detail 
below, the specific <l). values utUized will depend on the particular application in 
which audio processor 100 is being utilized. The 4>, are provided by controller 
112 shown in Figure 1. 


It is important that each of the band-pass filters preserve the phase of the 
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frequency component of x(t) selected by the filter in question. The phase- 
shifted signals are then summed by signal adder 16 to form output signal y(t). 

5 

Returning to Figure 1, the output of each phase-processor may be subject- 
ed to some form of post-processing. Hence, optional post-processing circuits 
121-124are shown in Figure 1. The post-processing in question may include 
amplifying the signals and/or mixing the signals with other signals derived from 
20 other sound sources. For example, additional stereophonic effects may be 

obtained by amplifying one channel relative to the remaining channels, thereby 
creating the illusion of a sound source closer to the speaker through which the 
corresponding output channel is played. 

15 The output signal in the ith output channel will be denoted by y.(t). To 

simplify the following discussion, it will be assumed that there are only two 
output channels and that delay circuit 107 is utilized in the second output chan- 
nel. 

20 The cross-coixelation measure of the output signals, yj(t) and y^(l) is 

determined by the phase shifts that were added to the various frequency 
components of x(t). In the preferred embodiment of the present invention, the 
<^>. are chosen randomly between two limits which will be defined to be P-5P and 
P+5P, respectively. Since y^(i) is merely x(t) delayed by an amount to be 

25 discussed below, the <f>. are the phase difference between the yfi) and y^(i) in 

the various frequency bands. Other methods for choosing the phase shifts will 
be described below. 

The value of P (modulo 2t) determines the relative balance between the 
30 positive and negative peaks in the cross-correlation function. When P is equal 

to zero,^ the positive peak is at its maximum (close to 1) and the negative peak is 
at its minimum (close to O). When P is equal to the positive peak is at its 
minimum (close to O) and the negative peak is at its maximum (close to -1). 
When P is close to 7r/2 or 3 t/2, the positive and negative peaks are approxi- 
35 mately of equal magnitude. 

If a positive cross-correlation measure is to be obtained, then -ir/2 < P < 
x/2. A negative cross-correlation measure is obtained when t/2 < P <3t/2. 
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When P is approximately equal to -irll or rll, the negative and positive peaks in 
the cross-correlation function are very close in magnitude and the cross-correla- 
tion measure could be positive or negative. 

It has been found experimentally that P determines the image distance 
through the control of the ratio of the positive and negative peaks in the cross- 
conelation function. In loudspeaker reproduction, when P = O, the image is 
close to the loudspeakers. As P increases from O to x the image moves closer 
to the listener. At values near to t, the image will appear to be close to the 
head, inside the head, or behind the head. As the value of P increases from x to 
2t or as it decreases from x to O, the image moves back toward the loudspeak- 
ers. 

The effect of P is approximately symmetrical about x, but not entirely. 
For O < P < X, the positive peak in the cross-correlation function leads the 
negative peak. For x < P < 2x, the negative peaks leads the positive peak. 
Listeners report differences in the absolute distance of the sound source in these 
two conditions. 

It may also be shown that 6P determines the magnitude of the positive 
and/or negative peaks in the cross-correlation function. When 5P is O, the 
magninide of the peaks in cross-correlation function are at their maximum (close 
to +/- 1, but dependent on the value of P). As the value of 5P increases from 
O to X, the magnitude of the peaks in the cross-correlation function decrease. 
When 8P is equal to x, the magnitude of the peaks in the cross-correlation func- 
tion are at their minimum (close to zero regardless of the value of P). 

It is found experimentally that 5P determines the perceived image width 
through control of the magnitude of the peaks in the cross-correlation function. 
In loudspeaker reproduction when 5P = O, the image is narrow and tightly 
focused. As 6P increases from O to x the image becomes wider and more 
spatially diffuse. At values near to x, the image will appear to be extend from 
one speaker to the other. When 5P is close to x, the magnitude of P ceases to 
have any substantial effect in controlling the apparent location of the image 
between the Ustener and the speakers. In this case, the sound is perceived as 
originating from a broad sound source located between the speakers. 
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Hence, the present invention may be utilized to control both the image 
width and distance. P is selected in order to provide the desired image distance. 
5P is selected in order to provide the desired image width. This may be accom- 
plished by constructing a two-dimensional calibration curve for P as a function 
of image distance and 5P as a function of image width, wherein the choice of P 
and ^P are also dependent on each other. 

The manner in which the phase shifts are chosen between the limits 
specified by P and SP is important in determining the quality of the output sig- 
nals. In the preferred embodiment of the present invention, the ^. are chosen by 
generating a sequence of random numbers between the limits in question. 
Because of the finite number of frequency bands, it is found that different sets of 
random numbers produce slightly different effects. Hence, in the preferred 
embodiment of the present invention, a number of different sets of phase shifts 
are generated and the set producing the best effect, as judged by listening to the 
output signals, is selected. 

Although the preferred embodiment of the present invention utilizes 
randomly selected phase shifts, other methods of choosing the phase shifts in 
question may be utilized without departing from the teachings of the present 
invention. Some of these methods are discussed below. In choosing a set of 
phase shifts within the range specified by P and 5P, it is important that the phase 
shifts change direction frequently from band to band. Here, the phase shifts 
associated with two bands are said to change direction if the signal to the left 
speaker lags that to the right speaker in the first band while the signal to the left 
speaker leads that to the second speaker in the second band, or vice versa. As 
will be discussed in more detail below, this requirement is needed to prevent the 
perception of a "banded" or "broken" output signal. Consider three contiguous 
frequency bands having phase shifts <^>j, , and <f),^^^. On average, the 
change in phase shift should not be monotonic. That is, if <f>.^ > 0.^^ then, on 
average, <^.^j < Similarly, if <}>. < <^.^j then, on average, <^.^^ > <f>.^^. 
Clearly, because of the random manner in which the phase shifts are chosen, 
there will be cases for which three consecutive phase shifts will be monotonic. 
However, on average this condition should be met. 
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To better understand the need for this requirement, consider the case in 
which one wishes to create the iUusion of a physicaUy broad sound source emit- 
ting sound along its surface between the two speakers. A sound component 
having a positive phase shift will be perceived as originating from a source 
which is closer to one speaker. A sound component having a negative phase 
shift wUl be perceived as originating from a source which is closer to the other 
speaker. The exact position at which each of the components is perceived will 
depend on the magnitude of the phase shift-in question. Hence, the present 
invention produces a sound "image" that appears to emanate from a source that 
is made up of a collection of discrete sound components, each emitting sound in 
a specific frequency band and being located at a different position relative to the 
speakers. This requirement assures that, on average, signals from contiguous 
frequency bands will be perceived as originating from non-contiguous sources 
between the speakers. 

The distribution of phase shifts will determine the spatial distribution of 
sound components. If the phase shift distribution is not uniform in phase, the 
spatial distribution will not be uniform in space. A uniform spatial distribution 
is desired since it is found experimentally that such a distribution remains uni- 
form when ttie listener moves from the center line between the loudspeakers to a 
point off die center line. For example, when a listener is located left of tiie 
center line, sound from the left loudspeaker arrives before sound for the right 
loudspeaker which introduces a time delay in tiie arrival sound between tiie two 
ears. This time delay affects the phase difference at each frequency differently. 
A uniform distribution of phase provides tiie greatest assurance that tiiat sound 
image is not altered by the time delay, since it results in anotiier uniform distri- 
bution of phase. 

The above discussion deals only with the phase shifts, <t>^. The manner in 
which tiie width of tiie bands is selected will now be discussed. If the bands are 
too broad, tiie listener will perceive a broken or banded image. However, if tiie 
bands are made too narrow, otiier problems are encountered. 

As noted above, timbral shifts (so called coloration) of tiie output signal 
relative to input signal are to be avoided. These shifts arise from constructive 
and destructive interference. Such interference can arise from two independent 
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sources. First, the frequency bands into which the sound is divided have small 
overlaps. The degree of overlap depends on the specific filtering system used. 
5 Consider such an overlap. The frequencies in the overlap region are contained 

in two adjacent bands. Each band has a different phase shift; hence, the overlap 
region will have components with different phase shifts at the same frequency. 
Depending upon the difference in phase shifts, there will be either constructive 
or destructive interference at the overlap frequencies when the signals from the 
two bands are added back together after the phase shifting operation. This 
effect is minimized by choosing the broadest possible bands since the degree of 
overlap is relatively independent of the bandwidth. 

The second source of interference will be referred to as spatial interfer- 
ence. When loudspeakers are utilized to reproduce the channels, the listener 
will receive overlapping sound fields, each field being generated by a different 
loudspeaker. At any given frequency, the signals from the two speakers will be 
perfectly correlated, since they differ only by a phase shift which depends on the 
frequency in question. Hence, there will be either constructive or destructive 
2Q interference between the signals depending upon the phase shift in question. 

In addition, if the listener is not located on the center line between the 
speakers, there will be an additional phase shift added at each frequwicy. The 
additional phase shift results from the difference in distances between the listen- 

25 er and each speaker. For example, if the listener is closer to the right speaker, 

the signal from the left speaker will be delayed by a time equal to the difference 
in distance divided by the speed of sound. This time delay is equivalent to a 
frequency dependent phase shift being added to the output of one of the speak- 
ers. This added phase shift changes as the listener moves relative to the loud- 

3Q speakers. Hence, at any given location relative to the loudspeakers, the listener is 

located in a sound field consisting of the sum of two signals having phase shifts 
which depend on the location of the listener and sound frequency. These signals 
will interfere with one another and produce a second timbral shift pattern which 
depends on the location of the listener. 


35 


It is known from psycho-acoustical research that there is a critical band- 
width below which the human ear can not discriminate. The critical bandwidth 
depends on frequency, varying from approximately 100 Hz at low frequencies 
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(<2000 Hz) to approximately one seventh the center frequency of the band in 
question at high frequencies (>2000 Hz). 

Consider a band of critical bandwidth centered at a frequency E If the 
frequency bands utilized in the present invention are much smaller than the criti- 
cal bandwidth, then the critical frequency band in question will be made-up of a 
plurality of sub-bands, each with a different phase shift, (i>.. The intensity of the 
sound in the band will be the average of the intensities of each of the sub-bands. 
Each sub-band will have an intensity which has been modified by the construc- 
tive or destructive interference resulting from the combining of the sound fields 
from the two speakers. This intensity will vary from 0 to 100 percent of the 
intensity that would have been present had the interference not taken place. 

The undesired coloration results when the average intensity from band to 
band changes as a result of the interferences occurring at the sub-band level in 
each band. If the sub-bands were so small that there is a very large number of 
sub-bands in each band, then the change in average intensity from band to band 
would be negligible. 

This may be seen as follows. The average intensity of each band is the 
average of the intensities of each sub-band. The intensity of each sub-band is 
reduced by a factor which is a function of a randomly selected variable, i.e., the 
4, It is well known in the statistical arts that the standard deviation of the 
average of a function of a random variable for N values of the function goes to 
zero as N is increased to infinity. Thus, the variation form band to band is 
reduced as the number of sub-bands is increased. 

Therefore, the coloration due to spatial interference of the sound waves 
produced by the left and right loudspeakers is minimized by reducing the 
bandwidth. As a result, one can not choose a bandwidth which simultaneously 
minimizes the timbral shifts from both factors. 

In thej)referred embodiment of the present invention, the bandwidth is 
chosen experimentally between about 50 Hz and twice the critical bandwidtii. 
However, bandwidths as large as 4 critical bandwidths will function. If spatial 
interference coloration is small, tiien the larger bandwidth is found experimen- 
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tally to be more desirable. This will be the case when the listener is equidistant 
from the loudspeakers or wears headphones. 

In addition, the question of an optimal bandwidth must be examined from 
the standpoint of sound material being processed. In the case of periodic and 
quasi-periodic tones such as speech and most musical instruments it is useful to 
organize the bands such that the harmonically related partials fall into separate 
bands. The fundamental will fall into a first band, the second harmonic into 
another, and so on. The limit to this rule is that bands not become smaller than 
a critical bandwidth since higher harmonics will naturally fall together into a 
single critical band. In the case of non-periodic or noise-like sounds, there is no 
fundamental. In this case, partials will likely fall into every adjacent band. It is 
useful that these bands be as small as possible and again that the phase of these 
adjacent bands shift rapidly. Experience has shown that the optimal bandwidth 
for non-periodic sounds is two critical bands wide. 

We will now return to the issue of alternate methods of selecting phase 
shifts. The sound material being processed suggests different strategies. For 
periodic sounds, the non-adjacent bands containing harmonics should be phase 
shifted so that each partial is in a different spatial location. For non-periodic 
sounds, each adjacent band should be phase shifted so that adjacent bands of 
partials are in different spatial locations. Both strategies can be addressed 
together. Tkble 1 provides a list of center frequencies for bands and indications 
for left/right leading phase shifts such that adjacent bands lead in different direc- 
tions and fundamental and second harmonics fall into non-adjacent bands leading 
in different directions up to the limit of critical band spacing. The left channel 
is defined to lead the right channel if (<t>^ - <t>^) > 0. In the case in which a 
phase-shifted output signal is generated from the input signal, one of the ^'s will 
be zero. Hence, this is equivalent to requiring that the phase-shifts added to the 
frequency bands be chosen such that no three adjacent frequency bands are 
given phase shifts with the same sign. 
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The exact phase shifts for each band can also be prescribed. For example, a 
5 vivid stereo separation for frequencies below 1,500 Hz may be achieved by 

selecting a phase value such as +t (or-x) for each band. Alternatively, the 
phase shifts can be selected by choosing a random phase shift between 0 and fx 
for the L bands and 0 and -fx for the R bands, where f determines the apparent 
width of the image. If the image is to appear to emanate from a location be- 
tween the speakers and the listener, a constant can be added to each phase shift 
in a manner analogous to that described above. 

The above described embodiments of the present invention utilize band- 
pass filters and phase shift circuits. The same result may be obtained, however, 
by convolving x(t) with a filter function h(t) to produce y(t). That is, 

y(t) = } x(t-z)h(z)dz (2) 

The transformation function h(z) provides the phase shifting of the individual 
2Q frequency bands. 

The present invention preferably utilizes a digital input signal. If the 
signal source consists of an analog signal, it may be converted to digital form 
via a conventional analog-to-digital converter. In this case, each output signal 
25 consists of a sequence of digital values. The ith value for each output signal 
corresponds to the value of the output signal at a time iT, where T is the time 
between digital samples. In this case, the convolution operation given in Eq. (2) 
reduces to 


30 


35 


y(nT) = y = E„ X h , (3) 

^ n m n-m m* ^ ' 

where m runs from 0 to N-1 . The filter coefficients, h are calculated from 

m 

h^ = (l/N) exp(kmw+^^) (4) 

Here, k runs from 0 to N-1, w=2x/N, exp(a)=^*, and N is the total number 
of frequency samples. 
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In the above described preferred embodiment of the present invention, 
only one of the output signals is obtained from the input signal by processing the 
input signal, the other output signal being identical to the input signal. The 
output signal that is identical to the input signal can be delayed in time to 
compensate for the overall delay introduced by the processing. In the case that 
the processing is performed by convolution, this delay will be approximately 
equal to half the length of the convolution sequence. 

In the preferred embodiment, the cross-correlation measure value is 
determined by the relationship of the processed output channel to the unproc- 
essed output channel. That is, one of the output channels is not phase-proc- 
essed. It is found experimentally that the presence of an unprocessed channel 
reduces the perceived effect of any small Umbral shifts. Those skilled in the art 
will also recognize that the same interchannel relationship can be achieved in an 
implementation in which both output signals are processed. In such an imple- 
mentation, the phase characteristics we have described for the processed signal 
in the preferred embodiment are implemented such that the interchannel phase 
differences satisfy the conditions in question. 

Although the above embodiments of the present invention have been de- 
scribed with reference to stereophonic output signals, it will apparent to those 
skilled in the art that the principles described above may be utilized for provid- 
ing more than two output signals. For example, in theatrical sound systems four 
or more output channels are often utiUzed. Each of the output channels can be 
processed by a phase-processor according to the present invention. Each phase- 
processor would utilize its own set of phase shifts, 4>i. Each such set of phase 
shifts would be different from those used by other said phase-processors. 

Unlike prior art systems, the perceptual effects obtained with the present 
invention are resilient in loudspeaker reproduction, even when the listeners are 
far off the line equidistant between the two loudspeakers and even when the 
reproduction environment is reverberant. Experiments have shown that the 
effect is present even when the distance between the listener and each of the 
loudspeakers differs by as much as 15 meters in typical reproduction settings. 


The output signals provided by Uie present invention may be played 
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through conventional speakers or headphones. These signals may also be re- 
corded onto conventional stereophonic recording media for subsequent play- 
5 back through conventional stereophonic equipment. Such an audio recording 

has at least two channels. When the final mixing is completed, each sound track 
can be viewed as being composed of two signals Q(t) and R(t). The Q(t) signal 
is the result of the processing by the present invention. The R(t) signal repre- 
sents other processing such as mixing channels of information which are not 
processed by the present invention. 

Consider a recording having two channels. The first sound track would be 
composed of Qj(t) and Rj(t) and the second sound track would be composed of 
Q^(t) and R^(t). The amplitudes of the signals Qj(t) and Q^(t) at any given 
frequency will be denoted by Aj(f) and A^({). In a recording according to the 
present invention, Aj(f)=gA2(f) where g is a gain related constant. The phase 
of Qj(t) at any given frequency f will differ from that of Q,(t) by an amount 
^(f), where <t>(f) varies between P-5P and P+5P, and <f>(f) is a rapidly changing 
function of frequency. 

20 

For the purposes of this discussion, <^»(f) is defined to be rapidly varying if 
the following criteria are met. Consider the frequency spectrum as being 
broken into bands of width no larger than four critical bandwidths. Consider 
the average value of <f>(f) in any given band. <f>(f) is said to be a rapidly varying 
25 function of f if, on average, the sign of the difference in the average value of 

<f>(f) between bands is zero. If this criterion is met, then, on average, adjacent 
frequency bands will lead through different speakers when P is 0. 

There has been described herein a novel audio processing method 
2Q and apparatus. Various modifications to the present invention will become 

apparent to those skilled in the art from the foregoing description and accompa- 
nying drawings- Accordingly, the present invention is to be limited solely by 
the scope of the following claims. 
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WHAT IS CLAIMED IS: 

1. An audio processing system for generating a plurality of output channel 
signals from an input sound signal, said system comprising: 

means for receiving said input sound signal; 

phase-processing means for generating a phase-processed signal compris- 
ing a signal which is substantially equal to the sum of M band-limited signals, 
the ith said band-limited signal having an amplitude substantially equal to that of 
said input sound signal in a predetermined frequency range £ ± 8f and a phase 
which differs from the phase of said input sound signal in said predetermined 
frequency range by an amount <}>,, i running from 1 to M, wherein M>2 and <t>, 
is chosen between P-5P and P+5P, wherein <l>. is a rapidly varying function of i; 
and 

means for generating one of said channel signals from said phase-proc- 
essed signal. 

2. The system of Claim 1 wherein said Bl are chosen such that 81 is less 
than twice the critical bandwidth at I for all values of i for which f is less than 
some predetermined frequency. 

3. The system of Claim 1 wherein said 8f are chosen such that 81 is 
substantially equal to the critical bandwidth at f. for all values of i for which f. is 
less than some predetermined frequency. 

4. The system of Claim 1 wherein said f. and 8f. are chosen such that 
harmonically related partials are in different said frequency ranges for frequen- 
cies below a predetermined frequency. 

5. The system of Claim 1 further comprising delay means for generating 

t^nllv pmial tn that of 5;aid 
a delayed signal naving an ampULuuc <uiu pua^w ouu^.*»— v -i 

input signal; and 

means for generating one of said channel signals from said delayed signal, 
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said generated channel signal being different from the channel signal generated 
from said phase-processed signal. 

5 

6. A method for processing an input sound signal to generate a plurality 
of ou^ut channel signals, said method comprising the steps of: 

receiving said input sound signal; 

generating a phase-processed signal which is substantially equal to the sum 
of M band-limited signals, the ith said band-limited signal having an amplitude 
substantially equal to that of said selected input sound signal in a predetermined 
frequency range f. ± 5f. and a phase which differs from the phase of said input 
signal in said predetermined frequency range by an amount i running from 1 
to M, wherein M>2 and is chosen between P-5P and P+SF, wherein «/>. is a 
rapidly varying function of i; and 

generating one of said channel signals from said phase-processed signal. 
20 

7. The method of 6 wherein said 6f. are chosen such that 5f. is less than 
twice the critical bandwidth at f. for all values of i for which f. is less than some 
predetermined frequency. 

25 8- method of 6 wherein said 81 are chosen such that 6f. is substantial- 

ly equal to the critical bandwidth at f. for all values of i for which f. is less than 
some predetermined frequency. 

9. The method of 6 wherein said f. and df. are chosen such that harmoni- 
3 0 related partials are in different frequency ranges for frequencies below a 

predetermined frequency. 

10. The method of 6 further comprising the step of generating a delayed 
signal from input sound signal, and generating one of said channel signals from 

25 said delayed signal, said generated channel signal being different from the 

channel signal generated from said phase-shifted signal; 
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11. An audio recording comprising first and second channels, said first 
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channel comprising a signal which is the sum of two signals, Q,(t) and R,(t) and 
said second channel comprising the sum of two signals Q^it) and RjC), wherein 
Aj(f)=gA2(f), A,(t) being the intensity of Q,(t) at frequency f, A^(t) being the 
intensity of Q^ii) at frequency f, and g being a constant, the phase of Q,(t) at 
any given frequency f differs from that of Q^it) by an amount 4>(f), varying 
between P-5P and P+6P, and <t>({) being a rapidly changing function of frequen- 
cy f . 
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